Two words that I used to search are “lake boat” Since I love water and lakes, I wanted to get lake pictures. So I searched “silent lake” before “lake boat”, but there didn’t return over 200 pictures. I could find a nice picture with a boat on a lake, then I changed my words to “lake boat”
Tags: common tags are “lake”, “boat” and “sunset”
page orientation: Mostly landscape
Colors: The main colors is blue and Orange is the next common color.
knitr::kable(photo_data%>% select(pageURL))
meanLikesProportion <- photo_data$likesProportion %>% mean(na.rm=TRUE)
total_photos <- nrow(photo_data)
meanViews <- photo_data$views %>% mean(na.rm=TRUE)
boat_photos <- sum(str_detect(photo_data$tags, "boat"))
The mean proportion of likes for the selected photos is 2.2%.
A total of 50 photos were selected for analysis.
Among the selected photos, 37 have “boat” tags.
Then mean number of views for the selected photos is 6043.
#seperate rows with "," to get a single tag from tags
tags_new <- photo_data%>%
separate_rows(tags, sep = ", ")
tags_new
## # A tibble: 150 × 7
## previewURL pageURL tags likesProportion pageOrientation sizeLevel views
## <chr> <chr> <chr> <dbl> <chr> <chr> <dbl>
## 1 https://cdn.pi… https:… sail… 1.86 landscape mid 7631
## 2 https://cdn.pi… https:… lake… 1.86 landscape mid 7631
## 3 https://cdn.pi… https:… wate… 1.86 landscape mid 7631
## 4 https://cdn.pi… https:… boat 0.903 landscape large 28227
## 5 https://cdn.pi… https:… river 0.903 landscape large 28227
## 6 https://cdn.pi… https:… fore… 0.903 landscape large 28227
## 7 https://cdn.pi… https:… boat 1.00 landscape large 4089
## 8 https://cdn.pi… https:… sea 1.00 landscape large 4089
## 9 https://cdn.pi… https:… yacht 1.00 landscape large 4089
## 10 https://cdn.pi… https:… lake… 2.15 landscape large 1810
## # ℹ 140 more rows
# count number of each tags
tags_count <- tags_new%>%
group_by(tags)%>%
summarise(n())
#rename "n()" to "freq"
tags_count<- tags_count %>%
rename(freq = 2)
# Filter the top 5 tags from tags_count
top_tags_counts <- tags_count %>%
filter(freq %in% head(sort(freq, decreasing = TRUE), 5))
# Plot the bar plot
ggplot(top_tags_counts, aes(x = tags, y = freq)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(x = "Tags", y = "Frequency", title = "Top 5 Most Common Tags")
I demonstrated creativity by creating a plot representing top 5 most common tags. To get a separated tags , the contents from the Lab3A and Lab3B were useful like separate_rows() function to unnest the variable tags with “,” , group_by(), rename(), and filter().
In Module 3, I gained extensive knowledge in manipulating data from CSV and JSON formats through completing lab tasks and the project. It was interesting to discover how I could craft a new data frame tailored to my exploration needs by manipulating and summarizing data. Additionally, I learned to create calculated variables derived from dataset exploration. Utilizing APIs in the project further deepened my understanding of their functionality and application. This experience not only enhanced my data manipulation skills but also broadened my comprehension of APIs, underscoring their importance in data analysis. Lastly, reusing some functions from previous labs were a good way of revision.
library(tidyverse)
library(jsonlite)
json_data <- fromJSON("pixabay_data.json")
pixabay_photo_data <- json_data$hits
names(pixabay_photo_data)
quantiles <- pixabay_photo_data %>%
pull(imageSize) %>%
quantile()
#select(previewURL, pageURL, selected_photos, tags)
selected_photos <- pixabay_photo_data %>%
mutate(sizeLevel = ifelse(imageSize <= quantiles[1], "small",
ifelse(imageSize > quantiles[1] & imageSize <= quantiles[2], "mid", "large")),
pageOrientation = ifelse(previewWidth >= previewHeight, "landscape", "portrait"),
likesProportion = round((likes/views)*100,3)) %>%
select(previewURL, pageURL, tags, likesProportion, pageOrientation, sizeLevel, views)%>%
filter(likesProportion > quantile(likesProportion, 0.75)) # Adjust this condition as per your requirement
write_csv(selected_photos, "selected_photos.csv")
meanLikesProportion <- selected_photos$likesProportion %>% mean(na.rm=TRUE)
meanViews <- selected_photos$views %>% mean(na.rm=TRUE)
boat_photos <- sum(str_detect(selected_photos$tags, "boat"))
total_photos <- nrow(selected_photos)
selected_summaries <- selected_photos%>%
group_by(pageOrientation)%>%
summarise(n())
animation <- selected_photos %>%
pull(previewURL)%>%
image_read() %>%
image_animate(fps = 5)
animation
image_write(animation, "my_photos.gif")
#seperate rows with "," to get a single tag from tags
tags_new <- selected_photos%>%
separate_rows(tags, sep = ", ")
tags_new
# count number of each tags
tags_count <- tags_new%>%
group_by(tags)%>%
summarise(n())
#rename "n()" to "freq"
tags_count<- tags_count %>%
rename(freq = 2)
# Filter the top 5 tags from tags_count
top_tags_counts <- tags_count %>%
filter(freq %in% head(sort(freq, decreasing = TRUE), 5))
# Plot the bar plot
ggplot(top_tags_counts, aes(x = tags, y = freq)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(x = "Tags", y = "Frequency", title = "Top 5 Most Common Tags")